Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve gym env / tune hps #71

Merged
merged 33 commits into from
Feb 1, 2024
Merged

improve gym env / tune hps #71

merged 33 commits into from
Feb 1, 2024

Conversation

Armandpl
Copy link
Owner

@Armandpl Armandpl commented Feb 1, 2024

  • setup new reward function and sweep over them
    • nothing very conclusive, each of those seem to converge in about the same time and lead to a working policy
      • slight doubt about the cos reward because one of the run converged to a bad policy. todo (later): run more runs comparing the cos reward to another one and look at the resulting policies.
    • the idea was to use the exponential to have higher rewards when above pi/2 to try and get the agent to converge faster. It seems to be the case but the exp makes the reward more flat close to 0 which seems to make the early exploration slower.
  • setup tiny sweep over tqc hps:
    • tiny sweep because each run takes ~10min so we can't sweep over that much hyper-parameters
    • using sbx and jax could speed it up but I couldn't nicely setup jax cuda with poetry
    • found that a higher learning rate converges faster, other hp didn't seem to have an effect. todo later: use rliable to properly eval the result of sweeps!
  • add raw angles to the obs when limits depend on them, so as to not break the markov assumption
  • add a pid to slow down the motor on reset, trying to gain some wall time during training. this is currently badly tuned and may have damaged the motor. need to investigate further
  • delete unused code: scripts/train_sac.pyand scripts/robot_inference.py
  • add DeadZone wrapper, ideally we'll get rid of it once we figure out why the agent doesn't explore properly without it
  • debug issue where it seemed gsde wasn't used with tqc on sbx
    • we were updating the policy at each step which made it look like gaussian noise
  • setup training on mac
  • fix bug in FurutaReal env reset: was only working between -pi and pi, didn't take into account the pendulum could do multiple turn. it meant we were waiting until reset timeout when we didn't need to

@Armandpl Armandpl linked an issue Feb 1, 2024 that may be closed by this pull request
@Armandpl Armandpl changed the title improve gym env / tune hp improve gym env / tune hps Feb 1, 2024
@Armandpl Armandpl merged commit f12a1da into master Feb 1, 2024
2 checks passed
@Armandpl Armandpl deleted the sanity-check-robot branch February 1, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sanity check by training with sac or tqc
1 participant